Preparation of the HapMap genotype data
ثبت نشده
چکیده
We obtained the merged phase I+II+III release #28 (NCBI build 36, dbSNP b126) HapMap CEU genotype data consisting of 174 samples and 3,908,761 autosomal SNPs. HapMap genotype files for CEU samples and autosomal SNPs were downloaded from the HapMap data archive [1]. The HapMap project website is not longer available, however, genotype data can still be retrieved from the FTP server. In a preliminary analysis we checked family assignment as provided by HapMap by estimating pairwise relatedness with the method presented in [2]. For analysis, we excluded 209 SNPs known as indels and 31 SNPs which map to multiple positions as suggested by the HapMap readme file. Additionally, we excluded 1,158,824 monomorphic SNPs. Since some of the SNPs fulfilled more than one exclusion criterion, we excluded 1,159,062 SNPs. Based on the remaining 2,749,699 autosomal SNPs, we attempted to estimate the pairwise relatedness of all 174 HapMap CEU samples. The matrix of pairwise relatedness estimates is provided as Additional File 5. We discovered three cluster of individuals: 81 individuals sharing common SNPs with all other individuals and clusters of 9 and 84 individuals which do not share any SNP between the clusters. This means that pairwise relatedness could not be checked for individuals between the latter two clusters. We are focusing on complete and confirmed trio data, therefore we excluded 24 individuals who belong to 8 trios comprising the 9 individuals of the smaller cluster. Further, we identified a first degree relationship between NA07045 belonging to trio NA06986, NA06997, NA07045 and NA12813 belonging to trio NA12801, NA12812, NA12813. This was also noted in [3]. We excluded all 3 individuals of trio NA06986, NA06997, NA07045 because of slightly more heterozygosity as compared to the other family. Finally, we excluded 18 individuals which are not members of (complete) trio families. The remaining 129 individuals belonging to 43 trios were used for analysis. Additional File 6 contains a detailed list of samples and the reason for exclusion where applicable. Based on 129 individuals (43 trios) included in our study, filtering of 3,908,761 autosomal SNPs was done as follows. We excluded 209 SNPs known as indels and 31 SNPs which map to multiple positions as mentioned above. As we are focusing on a reliable set of SNPs measured in all individuals, i.e. 100% call rate, we excluded 2,888,347 SNPs not covering all samples. We checked for deviation from Hardy-Weinberg equilibrium by applying an exact test [4] on the 86 founder of the 43 trios and excluded one SNP with p < 10−6. Since some of the SNPs fulfilled more than one exclusion criterion, 2,888,546
منابع مشابه
HapMap filter 1.0: A tool to preprocess the HapMap genotypic data for association studies
UNLABELLED The International HapMap Project provides a resource of genotypic data on single nucleotide polymorphisms (SNPs), which can be used in various association studies to identify the genetic determinants for phenotypic variations. Prior to the association studies, the HapMap dataset should be preprocessed in order to reduce the computation time and control the multiple testing problem. T...
متن کاملInference of unexpected genetic relatedness among individuals in HapMap Phase III.
The International Haplotype Map Project (HapMap) has provided an essential database for studies of human population genetics and genome-wide association. Phases I and II of the HapMap project generated genotype data across ∼3 million SNP loci in 270 individuals representing four populations. Phase III provides dense genotype data on ∼1.5 million SNPs, generated by Illumina and Affymetrix platfo...
متن کاملAssessing Accuracy of Genotype Imputation in American Indians
BACKGROUND Genotype imputation is commonly used in genetic association studies to test untyped variants using information on linkage disequilibrium (LD) with typed markers. Imputing genotypes requires a suitable reference population in which the LD pattern is known, most often one selected from HapMap. However, some populations, such as American Indians, are not represented in HapMap. In the pr...
متن کاملThe International HapMap Project Web site.
The HapMap Web site at http://www.hapmap.org is the primary portal to genotype data produced as part of the International Haplotype Map Project. In phase I of the project, >1.1 million SNPs were genotyped in 270 individuals from four worldwide populations. The HapMap Web site provides researchers with a number of tools that allow them to analyze the data as well as download data for local analy...
متن کاملA second generation human haplotype map of over 3.1 million SNPs (Supplementary Information)
S1 The density of common SNPs in the Phase II HapMap and the assembled human genome S2 Analysis of data quality S2.1 Analysis of amplicon structure to genotyping error S2.2 Analysis of genotype discordance from overlap with Seattle SNPs S2.3 Analysis of genotype discordance from fosmid end sequences S2.4 Analysis of monomorphism/polymorphism discrepancies S2.5 Interchromosomal LD S3. Analysis o...
متن کامل